337 samples and 7577 SNPs from across the northern gulf and western Atlantic. 4356 SNPs after LD thinning
First, just run a basic PCA with all samples, colored by region:
We get clustering by shallow and deep/ nearshore and offshore, rather than Atlantic vs. Gulf. Suggesting that nearshore in the Atlantic and Gulf are more similar that geographic promixmate animals nearshore and offshore.
The patterns above maybe look like an inversion. If we saw this, we’d see the PCA loadings clustered in a single location in the genome, but the PCA loadings don’t indicate this, instead we see the loadings distributed across the genome.
Running multiple K’s with the cross-validation error for each. Lowest error indicates the most likely K. It is basically withholding a subset of the genotypes then predicting their values and comparees this to the withheld data.
Sorting the plot below by Atlantic on the left, Gulf on the right. Then shallow to deep by collection location
Note that the left group in the Atlantic are the nearshore individuals, the bottom left in the PCAs above.
For the rest of the analyses, I’ll run both the four population analysis as well as a six population analysis, splitting the intermediate and offshore into Gulf and Atlantic. I think this makes sense biologically and is justified given our hypotheses going into the analysis.
Next, testing if there is isolation by distance at various scales. For all I’m using PCA-based genetic distance with 64 PCs (based on Shirk et al.), but these values are very similar to Plink and Euclidian values.
Across all individuals, there is an IBD signal. But I think this is likely driven by underlying comparisons.
Split this out into each individual population across both regions. Within a population we generally see IBD, except for Atlantic Offshore. Inter-population comparisons have no signal.
Tt312 and 13Tt073 are divergent and causing that weird pattern in offshore atlantic. I’m not sure why. Maybe there’s an argument to drop them, but they don’t appear weird in other analyses, I don’t think. Need to double check missing data, etc for these individuals.
The Intermediate individuals are possibly hybrids, given the results above they’re intermediate in genetic distance for nearly all analyses. Here, I’ll test this using:
The f3-statistic explicitly tests whether a taxon of interest results
from admixture between two others: A significantly negative f3-statistic
supports the admixture hypothesis, while a positive value is not
informative. In our case, our taxon of interest (pop1) is
Intermediate while pop1 and pop2 are Coastal
and Offshore.
First, I calculated these statistics with the four populations assignments:
| pop1 | pop2 | pop3 | est | se | z | p |
|---|---|---|---|---|---|---|
| Intermediate | Coastal_Atl | Offshore | 0.0037947 | 0.0006491 | 5.8461604 | 0.0000000 |
| Intermediate | Coastal_Gulf | Offshore | -0.0005488 | 0.0005773 | -0.9506458 | 0.3417842 |
Remember, positive values are not informative, negative values indicate a population resulting from admixture. No significance here.
Next, I split into 6 populations:
| pop1 | pop2 | pop3 | est | se | z | p |
|---|---|---|---|---|---|---|
| Intermediate Atlantic | ||||||
| Intermediate_Atlantic | Coastal_Atl | Offshore_Atlantic | 0.0057962 | 0.0007623 | 7.6034085 | 0.0000000 |
| Intermediate_Atlantic | Coastal_Atl | Offshore_Gulf | 0.0057595 | 0.0006787 | 8.4857907 | 0.0000000 |
| Intermediate_Atlantic | Coastal_Gulf | Offshore_Atlantic | 0.0006596 | 0.0006667 | 0.9893106 | 0.3225112 |
| Intermediate_Atlantic | Coastal_Gulf | Offshore_Gulf | 0.0020004 | 0.0005960 | 3.3563314 | 0.0007898 |
| Intermediate Gulf | ||||||
| Intermediate_Gulf | Coastal_Atl | Offshore_Atlantic | 0.0004012 | 0.0007493 | 0.5354332 | 0.5923504 |
| Intermediate_Gulf | Coastal_Atl | Offshore_Gulf | 0.0002826 | 0.0006526 | 0.4330949 | 0.6649458 |
| Intermediate_Gulf | Coastal_Gulf | Offshore_Atlantic | -0.0044799 | 0.0006952 | -6.4443067 | 0.0000000 |
| Intermediate_Gulf | Coastal_Gulf | Offshore_Gulf | -0.0032210 | 0.0006094 | -5.2856212 | 0.0000001 |
This indicates that the intermediate Gulf population is a result of admixture between the coastal gulf and both the Offshore Atlantic and Gulf populations. There is no evidence in the Atlantic Intermediate population.
D-statistics, or ABBA-BABA tests, test for introgression by looking
for deviations from incomplete lineage sorting. In short, if we have a
tree with an ancestral “A” allele and derived “B” allele in the tree
(((P1,P2),P3),O) where O is the outgroup, we should see an “ABBA” or
“BABA” pattern at equal frequencies when there is incomplete lineage
sorting and no gene flow. If there is an over representation of either
ABBA or BABA, this suggests gene flow (see figure below, from the Dsuite
tutorial).
I ran this test with Dsuite, with Aduncus as
the outgroup. For the output below, P1 and P2 will always be arranged so
that D is positive and indicates geneflow between P2 and P3. P1 and P2
could be flipped which would just flip the sign of D to negative and
indicate gene flow between P1 and P3.
| P1 | P2 | P3 | Dstatistic | Z.score | p.value | BBAA | ABBA | BABA | p.value_multTesting |
|---|---|---|---|---|---|---|---|---|---|
| Coastal_Atl | Coastal_Gulf | Intermediate | 0.1043800 | 8.80336 | 0.0000000 | 148.398 | 127.419 | 103.3330 | 0.0000000 |
| Coastal_Atl | Coastal_Gulf | Offshore | 0.0712124 | 5.25906 | 0.0000001 | 219.905 | 100.627 | 87.2480 | 0.0000006 |
| Intermediate | Coastal_Atl | Offshore | 0.0239574 | 1.42199 | 0.1550280 | 193.157 | 105.564 | 100.6240 | 0.6201120 |
| Intermediate | Coastal_Gulf | Offshore | 0.0930599 | 5.88660 | 0.0000000 | 205.884 | 107.584 | 89.2652 | 0.0000000 |
| P1 | P2 | P3 | Dstatistic | Z.score | p.value | BBAA | ABBA | BABA | p.value_multTesting |
|---|---|---|---|---|---|---|---|---|---|
| Coastal_Atl | Coastal_Gulf | Intermediate_Atlantic | 0.1043000 | 8.8125300 | 0.0000000 | 146.849 | 127.6930 | 103.5720 | 0.0000000 |
| Coastal_Atl | Coastal_Gulf | Intermediate_Gulf | 0.1039980 | 8.3291000 | 0.0000000 | 154.582 | 126.1860 | 102.4120 | 0.0000000 |
| Coastal_Atl | Coastal_Gulf | Offshore_Atlantic | 0.0651194 | 4.4594000 | 0.0000082 | 229.247 | 96.8881 | 85.0410 | 0.0001644 |
| Coastal_Atl | Coastal_Gulf | Offshore_Gulf | 0.0761205 | 5.8891200 | 0.0000000 | 211.909 | 103.7900 | 89.1063 | 0.0000001 |
| Intermediate_Atlantic | Coastal_Atl | Intermediate_Gulf | 0.0034596 | 0.2326380 | 0.8160420 | 132.862 | 123.9690 | 123.1140 | 1.0000000 |
| Intermediate_Atlantic | Coastal_Atl | Offshore_Atlantic | 0.0198779 | 1.0956600 | 0.2732280 | 202.461 | 101.4630 | 97.5079 | 1.0000000 |
| Intermediate_Atlantic | Coastal_Atl | Offshore_Gulf | 0.0274589 | 1.6885100 | 0.0913126 | 187.978 | 108.4510 | 102.6550 | 1.0000000 |
| Intermediate_Gulf | Coastal_Atl | Offshore_Atlantic | 0.0199902 | 0.9907250 | 0.3218200 | 195.146 | 102.9920 | 98.9549 | 1.0000000 |
| Intermediate_Gulf | Coastal_Atl | Offshore_Gulf | 0.0273693 | 1.4770900 | 0.1396520 | 180.630 | 109.9980 | 104.1370 | 1.0000000 |
| Offshore_Gulf | Coastal_Atl | Offshore_Atlantic | 0.0418859 | 1.8357300 | 0.0663969 | 136.578 | 115.2220 | 105.9580 | 1.0000000 |
| Intermediate_Atlantic | Coastal_Gulf | Intermediate_Gulf | 0.1052520 | 8.0421600 | 0.0000000 | 138.552 | 129.3110 | 104.6830 | 0.0000000 |
| Intermediate_Atlantic | Coastal_Gulf | Offshore_Atlantic | 0.0830860 | 4.8449300 | 0.0000013 | 216.279 | 102.9970 | 87.1945 | 0.0000253 |
| Intermediate_Atlantic | Coastal_Gulf | Offshore_Gulf | 0.1016310 | 6.3494300 | 0.0000000 | 199.961 | 110.9970 | 90.5169 | 0.0000000 |
| Intermediate_Gulf | Coastal_Gulf | Offshore_Atlantic | 0.0823751 | 4.6976200 | 0.0000026 | 208.444 | 104.3550 | 88.4711 | 0.0000526 |
| Intermediate_Gulf | Coastal_Gulf | Offshore_Gulf | 0.1006260 | 6.2146000 | 0.0000000 | 192.076 | 112.3530 | 91.8088 | 0.0000000 |
| Offshore_Gulf | Coastal_Gulf | Offshore_Atlantic | 0.0973529 | 4.4513000 | 0.0000085 | 143.189 | 118.9830 | 97.8718 | 0.0001707 |
| Intermediate_Gulf | Intermediate_Atlantic | Offshore_Atlantic | 0.0004192 | 0.0332968 | 0.9734380 | 192.947 | 97.6398 | 97.5580 | 1.0000000 |
| Intermediate_Gulf | Intermediate_Atlantic | Offshore_Gulf | 0.0003095 | 0.0245152 | 0.9804420 | 178.944 | 103.3700 | 103.3060 | 1.0000000 |
| Offshore_Gulf | Intermediate_Atlantic | Offshore_Atlantic | 0.0241332 | 1.4828900 | 0.1381040 | 132.257 | 112.6530 | 107.3440 | 1.0000000 |
| Offshore_Gulf | Intermediate_Gulf | Offshore_Atlantic | 0.0242528 | 1.4013900 | 0.1610960 | 130.058 | 110.3830 | 105.1550 | 1.0000000 |
Maximum likelihood tree estimating drift among populations. Migration edges are fit to the tree to improve populations that are a poor fit to the model. Migration gets addes stepwise. You can estimate the number of migration events that improves the model fit best, similar to structure evanno type approaches.
First fit the trees with 0 migration events:
Best number of migrations events:
The four population result is consistent and clear. There are two migration edges, between intermediate and offshore and the node of intermediate/Coastal Gulf and offshore. This tree is well supported and consistent across runs (100 runs, nearly all show this exact tree, below).
In contrast, with 6 populations, things are much more uncertain/unstable. For the most likely tree (top left, in figure below) there are migration edges between the intermediate pops and the branch leading to coastal populations. There is also migration from the node of coastal Gulf/intermediate with both offshore populations. The next 5 most likely trees show similar variations on these migration events. Note that the position of the coastal and intermediate populations are unstable across runs. This maybe isn’t shocking given that these populations aren’t well supported in the other analyses.
Results that are consistent:
The basic idea behind these is that we can idenfity early generation hybrids by both their ancestry and (interclass) heterozygosity. We consider highly divergent differences (> 0.7 frequency; 0.8 snf 0.9 give similar results) between parental populations Ancestry Informative Markers (AIMs). Then an F1 hybrid would have a hybrid index based on these AIMs of 0.5 (50% of alleles from either parent population). We calculate how many of AIMs in the putative hybrids are heterozygous for ancestry from either parent. For an F1, all loci would be heterozygous, so this value would be 1. With F2, this heterozygosity would drop to ~0.5. and continue to drop if there is backcrossing.
Here’s a nice paper that shows expectations for different scenarios. In short, if we follow the expectation curve in the plot below, it is likely due to admixture and not isolation by distance or a similar process. In contrast, there should be no relationship between hybrid index and heterozygosity when admixture has not occurred and IBD is the main feature of the data https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14039
There are no F1 hybrids in these data, but the rest of the
variation is likely due to admixture, not neutral IDB.